{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "jTEzoMx6CasV"
      },
      "source": [
        "#### Copyright 2018 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "IhmPj1VVCfWb"
      },
      "outputs": [],
      "source": [
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "YHK6DyunSbs4"
      },
      "source": [
        "# Cat vs. Dog Image Classification\n",
        "## Exercise 3: Feature Extraction and Fine-Tuning\n",
        "**_Estimated completion time: 30 minutes_**\n",
        "\n",
        "In Exercise 1, we built a convnet from scratch, and were able to achieve an accuracy of about 70%. With the addition of data augmentation and dropout in Exercise 2, we were able to increase accuracy to about 80%. That seems decent, but 20% is still too high of an error rate. Maybe we just don't have enough training data available to properly solve the problem. What other approaches can we try?\n",
        "\n",
        "In this exercise, we'll look at two techniques for repurposing feature data generated from image models that have already been trained on large sets of data, **feature extraction** and **fine tuning**, and use them to improve the accuracy of our cat vs. dog classification model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "dI5rmt4UBwXs"
      },
      "source": [
        "## Feature Extraction Using a Pretrained Model\n",
        "\n",
        "One thing that is commonly done in computer vision is to take a model trained on a very large dataset, run it on your own, smaller dataset, and extract the intermediate representations (features) that the model generates. These representations are frequently informative for your own computer vision task, even though the task may be quite different from the problem that the original model was trained on. This versatility and repurposability of convnets is one of the most interesting aspects of deep learning.\n",
        "\n",
        "In our case, we will use the [Inception V3 model](https://arxiv.org/abs/1512.00567) developed at Google, and pre-trained on [ImageNet](http://image-net.org/), a large dataset of web images (1.4M images and 1000 classes). This is a powerful model; let's see what the features that it has learned can do for our cat vs. dog problem.\n",
        "\n",
        "First, we need to pick which intermediate layer of Inception V3 we will use for feature extraction. A common practice is to use the output of the very last layer before the `Flatten` operation, the so-called \"bottleneck layer.\" The reasoning here is that the following fully connected layers will be too specialized for the task the network was trained on, and thus the features learned by these layers won't be very useful for a new task. The bottleneck features, however, retain much generality.\n",
        "\n",
        "Let's instantiate an Inception V3 model preloaded with weights trained on ImageNet:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "1xJZ5glPPCRz"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "\n",
        "from tensorflow.keras import layers\n",
        "from tensorflow.keras import Model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "VaXLMtYiF0t9"
      },
      "source": [
        "Now let's download the weights:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "KMrbllgAFipZ"
      },
      "outputs": [],
      "source": [
        "!wget --no-check-certificate \\\n",
        "    https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \\\n",
        "    -O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "UnRiGBfOF8rq"
      },
      "outputs": [],
      "source": [
        "from tensorflow.keras.applications.inception_v3 import InceptionV3\n",
        "\n",
        "local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'\n",
        "pre_trained_model = InceptionV3(\n",
        "    input_shape=(150, 150, 3), include_top=False, weights=None)\n",
        "pre_trained_model.load_weights(local_weights_file)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "IcYZPBS3bTAj"
      },
      "source": [
        "By specifying the `include_top=False` argument, we load a network that doesn't include the classification layers at the top—ideal for feature extraction."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "CFxrqTuJee5m"
      },
      "source": [
        "Let's make the model non-trainable, since we will only use it for feature extraction; we won't update the weights of the pretrained model during training."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "a38rB3lyedcB"
      },
      "outputs": [],
      "source": [
        "for layer in pre_trained_model.layers:\n",
        "  layer.trainable = False"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "XGBGDiOAepnO"
      },
      "source": [
        "The layer we will use for feature extraction in Inception v3 is called `mixed7`. It is not the bottleneck of the network, but we are using it to keep a sufficiently large feature map (7x7 in this case). (Using the bottleneck layer would have resulting in a 3x3 feature map, which is a bit small.) Let's get the output from `mixed7`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "Cj4rXshqbQlS"
      },
      "outputs": [],
      "source": [
        "last_layer = pre_trained_model.get_layer('mixed7')\n",
        "print('last layer output shape:', last_layer.output_shape)\n",
        "last_output = last_layer.output"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "XxHk6XQLeUWh"
      },
      "source": [
        "Now let's stick a fully connected classifier on top of `last_output`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "BMXb913pbvFg"
      },
      "outputs": [],
      "source": [
        "from tensorflow.keras.optimizers import RMSprop\n",
        "\n",
        "# Flatten the output layer to 1 dimension\n",
        "x = layers.Flatten()(last_output)\n",
        "# Add a fully connected layer with 1,024 hidden units and ReLU activation\n",
        "x = layers.Dense(1024, activation='relu')(x)\n",
        "# Add a dropout rate of 0.2\n",
        "x = layers.Dropout(0.2)(x)\n",
        "# Add a final sigmoid layer for classification\n",
        "x = layers.Dense(1, activation='sigmoid')(x)\n",
        "\n",
        "# Configure and compile the model\n",
        "model = Model(pre_trained_model.input, x)\n",
        "model.compile(loss='binary_crossentropy',\n",
        "              optimizer=RMSprop(lr=0.0001),\n",
        "              metrics=['acc'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "_6ECjowwV5Ug"
      },
      "source": [
        "For examples and data preprocessing, let's use the same files and `train_generator` as we did in Exercise 2."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Cl-IqOTjZVw_"
      },
      "source": [
        "**NOTE:** The 2,000 images used in this exercise are excerpted from the [\"Dogs vs. Cats\" dataset](https://www.kaggle.com/c/dogs-vs-cats/data) available on Kaggle, which contains 25,000 images. Here, we use a subset of the full dataset to decrease training time for educational purposes."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "O4s8HckqGlnb"
      },
      "outputs": [],
      "source": [
        "!wget --no-check-certificate \\\n",
        "   https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip -O \\\n",
        "   /tmp/cats_and_dogs_filtered.zip"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "Fl9XXARuV_eg"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import zipfile\n",
        "\n",
        "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
        "\n",
        "local_zip = '/tmp/cats_and_dogs_filtered.zip'\n",
        "zip_ref = zipfile.ZipFile(local_zip, 'r')\n",
        "zip_ref.extractall('/tmp')\n",
        "zip_ref.close()\n",
        "\n",
        "# Define our example directories and files\n",
        "base_dir = '/tmp/cats_and_dogs_filtered'\n",
        "train_dir = os.path.join(base_dir, 'train')\n",
        "validation_dir = os.path.join(base_dir, 'validation')\n",
        "\n",
        "# Directory with our training cat pictures\n",
        "train_cats_dir = os.path.join(train_dir, 'cats')\n",
        "\n",
        "# Directory with our training dog pictures\n",
        "train_dogs_dir = os.path.join(train_dir, 'dogs')\n",
        "\n",
        "# Directory with our validation cat pictures\n",
        "validation_cats_dir = os.path.join(validation_dir, 'cats')\n",
        "\n",
        "# Directory with our validation dog pictures\n",
        "validation_dogs_dir = os.path.join(validation_dir, 'dogs')\n",
        "\n",
        "train_cat_fnames = os.listdir(train_cats_dir)\n",
        "train_dog_fnames = os.listdir(train_dogs_dir)\n",
        "\n",
        "# Add our data-augmentation parameters to ImageDataGenerator\n",
        "train_datagen = ImageDataGenerator(\n",
        "    rescale=1./255,\n",
        "    rotation_range=40,\n",
        "    width_shift_range=0.2,\n",
        "    height_shift_range=0.2,\n",
        "    shear_range=0.2,\n",
        "    zoom_range=0.2,\n",
        "    horizontal_flip=True)\n",
        "\n",
        "# Note that the validation data should not be augmented!\n",
        "val_datagen = ImageDataGenerator(rescale=1./255)\n",
        "\n",
        "train_generator = train_datagen.flow_from_directory(\n",
        "        train_dir, # This is the source directory for training images\n",
        "        target_size=(150, 150),  # All images will be resized to 150x150\n",
        "        batch_size=20,\n",
        "        # Since we use binary_crossentropy loss, we need binary labels\n",
        "        class_mode='binary')\n",
        "\n",
        "# Flow validation images in batches of 20 using val_datagen generator\n",
        "validation_generator = val_datagen.flow_from_directory(\n",
        "        validation_dir,\n",
        "        target_size=(150, 150),\n",
        "        batch_size=20,\n",
        "        class_mode='binary')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "qEC1AL7iVRLz"
      },
      "source": [
        "Finally, let's train the model using the features we extracted. We'll train on all 2000 images available, for 2 epochs, and validate on all 1,000 validation images."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "Blhq2MAUeyGA"
      },
      "outputs": [],
      "source": [
        "history = model.fit_generator(\n",
        "      train_generator,\n",
        "      steps_per_epoch=100,\n",
        "      epochs=2,\n",
        "      validation_data=validation_generator,\n",
        "      validation_steps=50,\n",
        "      verbose=2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "lRjyAkE62aOG"
      },
      "source": [
        "You can see that we reach a validation accuracy of 88–90% very quickly. This is much better than the small model we trained from scratch."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "tt15y6IS2pBo"
      },
      "source": [
        "## Further Improving Accuracy with Fine-Tuning\n",
        "\n",
        "In our feature-extraction experiment, we only tried adding two classification layers on top of an Inception V3 layer. The weights of the pretrained network were not updated during training. One way to increase performance even further is to \"fine-tune\" the weights of the top layers of the pretrained model alongside the training of the top-level classifier. A couple of important notes on fine-tuning:\n",
        "\n",
        "- **Fine-tuning should only be attempted *after* you have trained the top-level classifier with the pretrained model set to non-trainable**. If you add a randomly initialized classifier on top of a pretrained model and attempt to train all layers jointly, the magnitude of the gradient updates will be too large (due to the random weights from the classifier), and your pretrained model will just forget everything it has learned.\n",
        "- Additionally, we **fine-tune only the *top layers* of the pre-trained model** rather than all layers of the pretrained model because, in a convnet, the higher up a layer is, the more specialized it is. The first few layers in a convnet learn very simple and generic features, which generalize to almost all types of images. But as you go higher up, the features are increasingly specific to the dataset that the model is trained on. The goal of fine-tuning is to adapt these specialized features to work with the new dataset.\n",
        "\n",
        "All we need to do to implement fine-tuning is to set the top layers of Inception V3 to be trainable, recompile the model (necessary for these changes to take effect), and resume training. Let's unfreeze all layers belonging to the `mixed7` module—i.e., all layers found after `mixed6`—and recompile the model:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "_l_J4S0Z2rgg"
      },
      "outputs": [],
      "source": [
        "from tensorflow.keras.optimizers import SGD\n",
        "\n",
        "unfreeze = False\n",
        "\n",
        "# Unfreeze all models after \"mixed6\"\n",
        "for layer in pre_trained_model.layers:\n",
        "  if unfreeze:\n",
        "    layer.trainable = True\n",
        "  if layer.name == 'mixed6':\n",
        "    unfreeze = True\n",
        "\n",
        "# As an optimizer, here we will use SGD \n",
        "# with a very low learning rate (0.00001)\n",
        "model.compile(loss='binary_crossentropy',\n",
        "              optimizer=SGD(\n",
        "                  lr=0.00001, \n",
        "                  momentum=0.9),\n",
        "              metrics=['acc'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "zE37ARlqY9da"
      },
      "source": [
        "Now let's retrain the model. We'll train on all 2000 images available, for 50 epochs, and validate on all 1,000 validation images. (This may take 15-20 minutes to run.)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "o_GgDGG4Y_hJ"
      },
      "outputs": [],
      "source": [
        "history = model.fit_generator(\n",
        "      train_generator,\n",
        "      steps_per_epoch=100,\n",
        "      epochs=50,\n",
        "      validation_data=validation_generator,\n",
        "      validation_steps=50,\n",
        "      verbose=2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "3EPGn58ofwq5"
      },
      "source": [
        "We are seeing a nice improvement, with the validation loss going from ~1.7 down to ~1.2, and accuracy going from 88% to 92%. That's a 4.5% relative improvement in accuracy.\n",
        "\n",
        "Let's plot the training and validation loss and accuracy to show it conclusively:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "1FtxcKjJfxL9"
      },
      "outputs": [],
      "source": [
        "%matplotlib inline\n",
        "\n",
        "import matplotlib.pyplot as plt\n",
        "import matplotlib.image as mpimg\n",
        "\n",
        "# Retrieve a list of accuracy results on training and validation data\n",
        "# sets for each training epoch\n",
        "acc = history.history['acc']\n",
        "val_acc = history.history['val_acc']\n",
        "\n",
        "# Retrieve a list of list results on training and validation data\n",
        "# sets for each training epoch\n",
        "loss = history.history['loss']\n",
        "val_loss = history.history['val_loss']\n",
        "\n",
        "# Get number of epochs\n",
        "epochs = range(len(acc))\n",
        "\n",
        "# Plot training and validation accuracy per epoch\n",
        "plt.plot(epochs, acc)\n",
        "plt.plot(epochs, val_acc)\n",
        "plt.title('Training and validation accuracy')\n",
        "\n",
        "plt.figure()\n",
        "\n",
        "# Plot training and validation loss per epoch\n",
        "plt.plot(epochs, loss)\n",
        "plt.plot(epochs, val_loss)\n",
        "plt.title('Training and validation loss')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "X-fUIeizakjE"
      },
      "source": [
        "Congratulations! Using feature extraction and fine-tuning, you've built an image classification model that can identify cats vs. dogs in images with over 90% accuracy."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "x_ANwJCnx7w-"
      },
      "source": [
        "## Clean Up\n",
        "\n",
        "Run the following cell to terminate the kernel and free memory resources:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {
          "autoexec": {
            "startup": false,
            "wait_interval": 0
          }
        },
        "colab_type": "code",
        "id": "-hUmyohAyBzh"
      },
      "outputs": [],
      "source": [
        "import os, signal\n",
        "os.kill(os.getpid(), signal.SIGKILL)"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [
        "jTEzoMx6CasV"
      ],
      "default_view": {},
      "name": "image_classification_part3.ipynb",
      "provenance": [],
      "version": "0.3.2",
      "views": {}
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
